Skip to content

Close out review debts: perf fix, dedup, latent-trap cleanup, Ladybug migration doc close-out#333

Merged
HumanBean17 merged 5 commits into
masterfrom
chore/review-debts
Jun 17, 2026
Merged

Close out review debts: perf fix, dedup, latent-trap cleanup, Ladybug migration doc close-out#333
HumanBean17 merged 5 commits into
masterfrom
chore/review-debts

Conversation

@HumanBean17

Copy link
Copy Markdown
Owner

Applies the four prioritized debts from the six-domain codebase review (no big refactor — the architecture was assessed sound; these are the bounded, high-value cleanups).

Debt 2 — is_ignored O(n²) on the index hot path

LayeredIgnore.is_ignored returned (bool, IgnoreLayer) and ran _winning_row (one GitIgnoreSpec rebuild per ignore-rule prefix) on every call. All 7 production callers discarded the layer; only diagnose-ignore needs source attribution, and diagnose_dict already computes it. is_ignored now returns a plain bool; the two tests that checked the layer migrate to diagnose_dict. On a repo with ~100 ignore rules this cuts ~5000 spec compilations per indexed file down to one.

Debt 4 — mechanical duplication (4a–4d)

  • ast_java: four byte-identical _codebase_*_inner_annotation_nodes walkers → one _inner_annotation_nodes(node, src, target_simple).
  • graph_enrich: three identical _route/_client/_async _hint_lookup helpers → one generic _hint_lookup (TypeVar).
  • ladybug_queries: find_callers / find_callees (~50-line near-twins) → one _walk_calls; public signatures preserved.
  • pr_analysis: drop the dead notes local in compute_risk (never appended; real notes come from analyze_pr_pipeline).

Deferred (4e): the meta() 5-level query cascade in ladybug_queries is not cleanly dead — it is reachable via direct LadybugGraph(...) construction that bypasses get()'s ontology gate, and test_feign_not_exoser relies on its fallback. Left in place; revisit after the migration fully settles.

Debt 3 — latent footguns

  • mcp_v2: 15 error branches passed hints=[] to NeighborsOutput / DescribeOutput, which have no hints field (real fields advisories / hints_structured default to []). pydantic silently dropped the kwarg — harmless today, but a ValidationError landmine the moment anyone adds extra="forbid". Removed (the models already default to empty).
  • build_ast_graph: the four brownfield layer names were spelled out 4×. Promoted _BROWNFIELD_LAYERS to the single source of truth; brownfield_strategies is now _BROWNFIELD_LAYERS | {codebase_client, codebase_producer}. Behavior-preserving — the two sets still differ deliberately (layer gate vs. stat counter); that relationship is now explicit instead of two unrelated literals.

Debt 1 — close out the LadybugDB migration

The KuzuDB→LadybugDB migration (#302) landed as code but its doc/string sweep and plan close-out were never finished. Completed here:

  • Docs: current-state Kuzu→LadybugDB, code_graph.kuzu.lbug, --kuzu-path--ladybug-path, kuzu_queries.pyladybug_queries.py, KuzuGraphLadybugGraph, kuzu_pathladybug_path; ontology 15 / 1617 (code was already 17). Across README, AGENTS, CONFIGURATION, CLI guide, verification checklist, CODEBASE_REQUIREMENTS, AGENT-GUIDE, PRODUCT-VISION, tests/README.
  • Factual fixes (from the parallel markdown freshness audit): README DECLARES_ROUTE (nonexistent edge)→EXPOSES; role list no longer lists PRODUCER (a node kind) and now includes COMPONENT / CONFIG / ENTITY; EMBEDDING_MODELSBERT_MODEL (the real env var). AGENT-GUIDE + SKILL route frameworks corrected to spring_mvc / webflux (kafka/rabbitmq/jms/stream are route kinds; feign is a client kind). PRODUCT-VISION CALLS is shipped, not planned. External citation titles left as Kuzu.
  • Shipped artifacts: install_data/{skills,agents} explorer copies re-synced from source (were behind — missing source_layer, schema-rejection note). Moved the landed PLAN / propose for LADYBUG-DB-MIGRATE and INDEX-OUTPUT-REWORK from active/ to completed/.
  • Source: docstring/help-string sweep only (cli/pr_analysis/mcp_v2/search_lancedb, conftest, test docstrings) — no behaviour change; the one stale kuzu 0.11.x version reference in mcp_v2 is genericized.

Verification

  • .venv/bin/ruff check . — clean.
  • .venv/bin/python -m pytest tests -q838 passed, 13 skipped (skips are the JAVA_CODEBASE_RAG_RUN_HEAVY-gated tests).

Notes

  • No re-index required — no schema, ontology, or ranking change (ontology stays 17; only docs that wrongly said 15/16 were corrected).
  • No env-var changeSBERT_MODEL was always the real variable; only the doc that said EMBEDDING_MODEL was fixed.
  • is_ignored return type changed from tuple[bool, IgnoreLayer|None] to bool (internal API; all callers updated). Breaking change is allowed per repo policy.

🤖 Generated with Claude Code

HumanBean17 and others added 5 commits June 15, 2026 22:51
is_ignored returned (bool, IgnoreLayer) and computed _winning_row (one
GitIgnoreSpec rebuild per ignore-rule prefix) on every call. All 7
production callers (iter_java_source_files + java_index_flow_lancedb)
discarded the layer; only diagnose-ignore needs source attribution, and
diagnose_dict already computes it. is_ignored now returns a plain bool;
the two test assertions that checked the layer's source migrate to
diagnose_dict. On a repo with ~100 ignore rules this cuts ~5000 spec
compilations per indexed file down to one.

Co-Authored-By: Claude <noreply@anthropic.com>
- ast_java: four byte-identical _codebase_*_inner_annotation_nodes walkers
  -> one _inner_annotation_nodes(node, src, target_simple).
- graph_enrich: three identical _route/_client/_async _hint_lookup helpers
  -> one generic _hint_lookup (TypeVar) used at all three call sites.
- ladybug_queries: find_callers/find_callees (~50-line near-twins) -> one
  _walk_calls helper; the two public methods keep their signatures and
  delegate, differing only in call-graph orientation.
- pr_analysis: drop the dead 'notes' local in compute_risk (never appended;
  the real notes are assembled by analyze_pr_pipeline and merged there).

meta()'s 5-level query cascade in ladybug_queries is intentionally left in
place: it is reachable via direct LadybugGraph(...) construction that bypasses
get()'s ontology gate, and test_feign_not_exoser relies on its fallback, so it
is not cleanly dead right after the Ladybug migration.

Co-Authored-By: Claude <noreply@anthropic.com>
…terals

mcp_v2: 15 error branches passed hints=[] to NeighborsOutput/DescribeOutput,
neither of which has a hints field (their fields are advisories /
hints_structured, both defaulting to []). pydantic silently dropped the kwarg;
today harmless, but the moment anyone adds extra='forbid' to those models every
error branch would raise ValidationError swallowed by the catch-all. The models
already default to empty, so the dead kwargs are simply removed.

build_ast_graph: the four brownfield layer names were spelled out four times
(_client_source_layer, _producer_source_layer, brownfield_strategies, and
_BROWNFIELD_LAYERS). Promote _BROWNFIELD_LAYERS to the single source of truth
and define brownfield_strategies as _BROWNFIELD_LAYERS plus the two caller-side
declaration strategies (codebase_client/codebase_producer). The two sets still
differ deliberately: _BROWNFIELD_LAYERS gates brownfield_only authoritativeness
(per edge), while brownfield_strategies counts annotation-declared callers in
the *_from_brownfield_pct stats — now that relationship is explicit instead of
two independent literals that looked unrelated.

Co-Authored-By: Claude <noreply@anthropic.com>
The KuzuDB->LadybugDB migration (commit #302) landed as code but its doc/string
sweep and plan close-out were never finished, leaving operator and agent docs
asserting the old store. This completes it:

Docs (current-state Kuzu -> LadybugDB, code_graph.kuzu -> .lbug, --kuzu-path
-> --ladybug-path, kuzu_queries.py -> ladybug_queries.py, KuzuGraph ->
LadybugGraph, kuzu_path -> ladybug_path; ontology 15/16 -> 17):
  README, AGENTS, docs/CONFIGURATION, docs/JAVA-CODEBASE-RAG-CLI,
  docs/MANUAL-VERIFICATION-CHECKLIST, docs/CODEBASE_REQUIREMENTS,
  docs/AGENT-GUIDE, docs/PRODUCT-VISION, tests/README.

Factual fixes surfaced by the markdown freshness audit:
  - README: DECLARES_ROUTE (nonexistent edge) -> EXPOSES; role list no longer
    lists PRODUCER (a node kind) and now includes COMPONENT/CONFIG/ENTITY;
    EMBEDDING_MODEL -> SBERT_MODEL (the real env var).
  - AGENT-GUIDE + SKILL: route frameworks corrected to spring_mvc/webflux
    (kafka/rabbitmq/jms/stream are route kinds; feign is a client kind).
  - PRODUCT-VISION: CALLS is shipped, not 'planned'. External citation titles
    (footnotes 12/17) intentionally left as 'Kuzu'.

Shipped-artifact resync + plan close-out:
  - install_data/{skills,agents} explorer copies re-synced from source (they
    were behind, missing source_layer and the schema-rejection note).
  - Moved the landed PLAN/propose for LADYBUG-DB-MIGRATE and INDEX-OUTPUT-REWORK
    from active/ to completed/.

Source docstring/help-string sweep only (cli/pr_analysis/mcp_v2/search_lancedb,
conftest, test_ladybug_queries docstrings) — no behaviour change; the one
clearly-stale kuzu 0.11.x version reference in mcp_v2 is genericized.

Co-Authored-By: Claude <noreply@anthropic.com>
@HumanBean17 HumanBean17 merged commit a08f6bc into master Jun 17, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant